3 research outputs found

    AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

    Full text link
    In this work, we present a multimodal solution to the problem of 4D face reconstruction from monocular videos. 3D face reconstruction from 2D images is an under-constrained problem due to the ambiguity of depth. State-of-the-art methods try to solve this problem by leveraging visual information from a single image or video, whereas 3D mesh animation approaches rely more on audio. However, in most cases (e.g. AR/VR applications), videos include both visual and speech information. We propose AVFace that incorporates both modalities and accurately reconstructs the 4D facial and lip motion of any speaker, without requiring any 3D ground truth for training. A coarse stage estimates the per-frame parameters of a 3D morphable model, followed by a lip refinement, and then a fine stage recovers facial geometric details. Due to the temporal audio and video information captured by transformer-based modules, our method is robust in cases when either modality is insufficient (e.g. face occlusions). Extensive qualitative and quantitative evaluation demonstrates the superiority of our method over the current state-of-the-art

    SIDER: Single-Image Neural Optimization for Facial Geometric Detail Recovery

    Get PDF
    © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present SIDER (Single-Image neural optimization for facial geometric DEtail Recovery), a novel photometric optimization method that recovers detailed facial geometry from a single image in an unsupervised manner. Inspired by classical techniques of coarse-to-fine optimization and recent advances in implicit neural representations of 3D shape, SIDER combines a geometry prior based on statistical models and Signed Distance Functions (SDFs) to recover facial details from single images. First, it estimates a coarse geometry using a morphable model represented as an SDF. Next, it reconstructs facial geometry details by optimizing a photometric loss with respect to the ground truth image. In contrast to prior work, SIDER does not rely on any dataset priors and does not require additional supervision from multiple views, lighting changes or ground truth 3D shape. Extensive qualitative and quantitative evaluation demonstrates that our method achieves state-of-the-art on facial geometric detail recovery, using only a single in the-wild image.Peer ReviewedPostprint (author's final draft

    SIDER: Single-image neural optimization for facial geometric detail recovery

    Get PDF
    Trabajo presentado en la International Conference on Computer Vision (ICCV), celebrada de forma virtual del 11 al 17 de octubre de 2021In this work we present Sider, a method for high-fidelity detailed 3D face reconstruction from a single image that can be trained in an unsupervised manner. Our approach combines the best from classical statistical models and recent implicit neural representations. The former is used to obtain a coarse shape prior, and the latter provides high-frequency geometric detail, by only optimizing over a photometric loss computed w.r.t. the input image. A thorough quantitative and qualitative evaluation shows that Sider outperforms current state-of-the-art by a significant margin. A limitation of our current approach is that it still cannot handle details like hair or beards and accessories such as glasses. This is because the photometric loss for these regions would require sub-pixel accuracy. In the future, we will explore alternatives for addressing this type of high-frequency details.This work is partly supported by the Spanish government with the project MoHuCo PID2020-120049RB-I00 and Mar´ıa de Maeztu Seal of Excellence MDM-2016-0656. This work was also supported by a gift from Adobe, Partner University Fund 4DVision Project, and the SUNY2020 Infrastructure Transportation Security Cente
    corecore